Coding approaches to fault tolerance in dynamic systems
نویسنده
چکیده
A fault-tolerant system tolerates internal failures while preserving desirable overall behavior. Fault tolerance is necessary in life-critical or inaccessible applications, and also enables the design of reliable systems out of uilreliable, less expensive components. This thesis discusses fault tolerance in dynamic systems, such as finite-state controllers or computer simulations, whose internal state influences their future behavior. Modular redundancy (system replication) and other traditional techniques for fault tolerance are expensive, and rely heavily particularly in the case of dynamic systems operating over extended time horizons on the assumption that the error-correcting mechanism (e.g., voting) is faultfree. The thesis develops a systematic methodology for adding structured redundancy to a dynamic system and introducing associated fault tolerance. Our approach exposes a wide range of possibilities between no redundancy and full replication. Assuming that the errorcorrecting mechanism is fault-free, we parameterize the different possibilities in various settings, including algebraic machines, linear dynamic systems and Petri nets. By adopting specific error models and, in some cases, by making explicit connections with hardware implementations, we demonstrate how the redundant systems can be designed to allow detection/correction of a fixed number of failures. We do not explicitly address optimization criteria that could be used in choosing among different redundant implementations, but our examples illustrate how such criteria can be investigated in future work. The last part of the thesis relaxes the traditional assumption that error-correction be fault-free. We use unreliable system replicas and unreliable voters to construct redundant dynamic systenms that evolve in time with low probability of failure. Our approach generalizes modular redundancy by using distributed voting schemes. Combining these techniques with low-complexity error-correcting coding, we are able to efficiently protect identical unreliable linear finite-state machines that operate in parallel on distinct input sequences. The approach requires only a constant amount of redundant hardware per machine to achieve a probability of failure that remains below any pre-specified bound over any given finite time interval. Thesis Supervisor: George C. Verghese Title: Professor of Electrical Engineering
منابع مشابه
Coding Approaches to Fault Tolerance in Combinational and Dynamic Systems
Why should wait for some days to get or receive the coding approaches to fault tolerance in combinational and dynamic systems book that you order? Why should you take it if you can get the faster one? You can find the same book that you order right here. This is it the book that you can receive directly after purchasing. This coding approaches to fault tolerance in combinational and dynamic sys...
متن کاملFault tolerant Dynamic Scheduling of Object Based Tasks in Multiprocessor Real time Systems
Multiprocessor systems are fast emerging as a powerful computing tool for real time applications The reliability required of real time systems leads to the need for fault tolerance in such systems One way of achieving fault tolerance is by Primary Backup PB approach in which two copies of a task are run on two di erent processors In this paper we compare and contrast three basic PB approaches i...
متن کاملAn approach to fault detection and correction in design of systems using of Turbo codes
We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...
متن کاملFault - tolerant Dynamic Scheduling ofObject - Based
Multiprocessor systems are fast emerging as a powerful computing tool for real-time applications. The reliability required of real-time systems leads to the need for fault-tolerance in such systems. One way of achieving fault-tolerance is by Primary-Backup (PB) approach in which two copies of a task are run on two diierent processors. In this paper, we compare and contrast three basic PB approa...
متن کاملStability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999